A Robust Seedless Algorithm for Correlation Clustering
نویسندگان
چکیده
Finding correlation clusters in the arbitrary subspaces of highdimensional data is an important and a challenging research problem. The current state-of-the-art correlation clustering approaches are sensitive to the initial set of seeds chosen and do not yield the optimal result in the presence of noise. To avoid these problems, we propose RObust SEedless Correlation Clustering (ROSECC) algorithm that does not require the selection of the initial set of seeds. Our approach incrementally partitions the data in each iteration and applies PCA to each partition independently. ROSECC does not assume the dimensionality of the cluster beforehand and automatically determines the appropriate dimensionality (and the corresponding subspaces) of the correlation cluster. Experimental results on both synthetic and real-world datasets demonstrate the effectiveness of the proposed method. We also show the robustness of our method in the presence of a significant noise levels in the data.
منابع مشابه
Detecting compact binary coalescences with seedless clustering
Compact binary coalescences are a promising source of gravitational waves for second-generation interferometric gravitational-wave detectors. Although matched filtering is the optimal search method for well-modeled systems, alternative detection strategies can be used to guard against theoretical errors (e.g., involving new physics and/or assumptions about spin or eccentricity) while providing ...
متن کاملSearching for gravitational-wave transients with a qualitative signal model: seedless clustering strategies
Gravitational-wave bursts are observable as bright clusters of pixels in spectrograms of strain power. Clustering algorithms can be used to identify candidate gravitational-wave events. Clusters are often identified by grouping together seed pixels in which the power exceeds some threshold. If the gravitational-wave signal is long-lived, however, the excess power may be spread out over many pix...
متن کاملRobust Method for E-Maximization and Hierarchical Clustering of Image Classification
We developed a new semi-supervised EM-like algorithm that is given the set of objects present in eachtraining image, but does not know which regions correspond to which objects. We have tested thealgorithm on a dataset of 860 hand-labeled color images using only color and texture features, and theresults show that our EM variant is able to break the symmetry in the initial solution. We compared...
متن کاملAn improved opposition-based Crow Search Algorithm for Data Clustering
Data clustering is an ideal way of working with a huge amount of data and looking for a structure in the dataset. In other words, clustering is the classification of the same data; the similarity among the data in a cluster is maximum and the similarity among the data in the different clusters is minimal. The innovation of this paper is a clustering method based on the Crow Search Algorithm (CS...
متن کاملA Multi-Objective Approach to Fuzzy Clustering using ITLBO Algorithm
Data clustering is one of the most important areas of research in data mining and knowledge discovery. Recent research in this area has shown that the best clustering results can be achieved using multi-objective methods. In other words, assuming more than one criterion as objective functions for clustering data can measurably increase the quality of clustering. In this study, a model with two ...
متن کامل